Estimation of Latency of an API

Learn the factors involved in calculating the API latency induced by the network.

We know that the response time of an API consists of network latency and processing time, as depicted in the following equation:

Timeresponse=Timelatency+TimeprocessingTime_{response} = Time_{latency} + Time_{processing}

We calculated the processing time of an API in the previous lesson. Now, let's estimate the latency of an API for different HTTP methods in the subsequent sections.

Introduction#

Latency (also known as network latency) is the message propagation time between a client and a server. To estimate the total response time of a service, we need to estimate the latency of a service. Latency arising due to the network (such as the global Internet) is an important factor in the design of an API, because it answers the following key questions:

  • Is our service usable across different parts of the world?

  • What is the maximum time we can take to process a request (that is, the back-end processing time)? If network latency is high for some clients, the back-end services will have less time to complete the processing.

In a request-response architecture, the aim is to store or retrieve data from a server. This architecture usually targets performing CRUD operations, which earlier we mapped on different HTTP methods (verbs) like POST, GET, UPDATE, and DELETE. Although each method has its importance, all the methods follow the same structure and request pattern. We can generally categorize these methods into two types of requests:

  • Pull request: The type of request where data is requested from the server. This GET method is an example of such a request. For such requests, the server's response message size is usually large.

  • Push request: The type of request where we want to save data to the server or request an operation from the server side. Examples of such requests are POST, PUT, and DELETE. Generally, these methods have a larger request message size.

Request and response of GET and POST
Request and response of GET and POST

From the discussion above, we deduce that we can take one type of HTTP method from each category to perform our calculations. Therefore, we’ll perform our calculations on GET and POST methods in the subsequent sections. Analyzing the responses to these two methods will enable us to design a system with targeted latency for any of the request types listed above.

Note: For our production system’s monitoring we’ll observe all kinds of requests (not only GET or POST). But in this chapter, to learn how such estimation works, we have picked one of each type as a representative because, in principle, other types should be similar.

For latency, it is important to know what size of data is being stored or retrieved, because it will affect an API’s latency to propagate data of different sizes. That’s why it’s safe to assume that network latency is a function of many parameters, such as message size (bigger messages take longer transmission time) and distance between the client and the server (the farther the two peers, the more the propagation delay). Since it is possible to do back-of-the-envelope calculations based on the data size (along with other factors such as transmission, propagation, delay, etc.), we will consider this as the main factor for estimating the latency of an API service.

The GET request latency#

In this section, we estimate the latency time for a GET request. A client requests to fetch the top 100 posts from the server located in a different region. The response size, which contains only textual data, was approximately 28 KB. A sample response time of the GET request obtained through the Postman tool is provided below.

The GET request time response (first request)
The GET request time response (first request)

Note: The preparation time and the process time in gray text above are time taken by the software (Postman) to prepare requests and process the response. For a real application, similar times might be incurred by the client application (for example, a browser) to get ready and to process the received data. However, we’ll exclude them in our calculations below.

The total latency time is calculated using the following formula:

Timelatency=Timebase+RTTget+TimeDownload(1)Time_{latency}= Time_{base} + RTT_{get}+ Time_{Download} \qquad \qquad (1)

The transfer start time in the above image provided by Postman includes the RTT time of the GET request and the processing time taken by the server. Therefore, we have to exclude the server’s processing time from the transfer start to calculate RTTgetRTT_{get} for latency estimation using the following formula:

RTTget=Transfer start  Timeprocessing(2)RTT_{get} = Transfer\ start \ -\ Time_{processing} \qquad \qquad (2)

The processing time is 4 ms for a simple request (considering the minimum), as calculated in the previous lesson. Let's update equations (2) and (1) accordingly. Remember that base time includes DNS lookup, TCP and SSL handshakes.

RTTget=116.18 ms  4 ms=112.18 msTimelatency=214.73 ms+112.18 ms+4.51 ms=331.42 ms(3)RTT_{get} = 116.18\ ms \ -\ 4\ ms = 112.18\ ms \\ Time_{latency}= 214.73\ ms + 112.18\ ms + 4.51\ ms = 331.42\ ms \qquad (3)

Note: While the number above seems pretty high for a simple GET request, HTTP makes use of Connection: keep-alive header to introduce caching between the client and server. Therefore, the response time will substantially reduce for subsequent requests as we see next.

Latency of cached response#

The calculations above are realistic, but represent only the initial communication between the client and server (a cold start). In reality, the base time is only required on the first instance, and is cached on the subsequent GET requests, as shown below. This section will recompute the numbers to obtain a pragmatic latency of the GET requests after the initial cold start.

The response time of GET request (subsequent requests)
The response time of GET request (subsequent requests)

By putting values in equation (1) given above, we can calculate the latency of the request:

Timelatency=(0 ms)+(125.5 ms4 ms)+6.82 ms=128.32 msTime_{latency} = (0\ ms) + (125.5\ ms - 4\ ms) + 6.82\ ms = 128.32\ ms

Point to Ponder

Question

You might have noticed that the transfer start time for a cold start GET request is 116.8 ms, whereas it is 125.5 ms in the subsequent request with the cached base time. Why is this?

Hide Answer

There are many possible reasons for the subsequent transfer time to be larger than the first one.

First, the Internet is a dynamic conduit—network routes change, traffic density changes, and so on. Such factors can add substantial variance in the delays.

Second, the service infrastructure load is also fluid due to many conditions—such as current active users—so variance can arise from there as well.

Third, before processing the subsequent requests, the server might process the cache to check whether the response is cached, taking some extra time. It might be comical because caches are added to reduce the latency.

We’ll need appropriate monitoring infrastructure to pinpoint the exact reason for latency variance. Though for this chapter (back-of-the-envelope calculations), this variance in latency is acceptable.

The POST request latency#

Let’s see how we’ll perform the same estimation for a POST request. A request to create a post is submitted to the server, with a size of around 37 KBs. This request includes a media file, resulting in a larger request size. The response to such a request is a status code indicating the successful creation of the post. The response time for such a request is given in the following image:

The POST request time response (First request)
The POST request time response (First request)

Let's calculate the latency for POST using the equations mentioned earlier:

RTTpost=587.01 ms  4 ms=583.01 msTimelatency=217.44 ms+583.01 ms+5.03 ms=805.48 ms(3)RTT_{post} = 587.01\ ms \ -\ 4\ ms = 583.01\ ms \\ Time_{latency}= 217.44\ ms + 583.01\ ms + 5.03\ ms = 805.48\ ms \qquad (3)

Point to Ponder

Question

Above, while creating a post using a POST request, its size is ~ 37 KB—but the response size of the GET request for 100 posts was approximately 28 KBs. Why is there such a difference in size?

Hide Answer

While creating a post, the data contains media files such as images, resulting in larger data size. In contrast, when we fetch data of around 100 posts, it only contains textual data. If there are any media files, they are fetched later from CDN or other media servers using metadata fetched with the actual posts data.

Latency of cached response#

Let's see what happens when the same request is sent again to the server using cached information.

The POST request time response (second request)
The POST request time response (second request)

By putting values in equation (1) given above, we can calculate the latency of the request:

Timelatency=(0 ms)+537.28 ms+5.66 ms=542.94 msTime_{latency} = (0\ ms) + 537.28\ ms + 5.66\ ms =542.94\ ms

Discussion#

Our experiments using Postman were not fully controlled. At best, they provide some estimates under one kind of circumstance. Estimates for the back-of-the-envelope calculations can live with such sloppiness because of the coarse-grained nature of design-level estimation. During our estimation of the latency of the two types of requests, we made the following observations:

  • The base time for requests roughly remains the same for different requests to the same server. Although the base time adds considerable latency, it is negligible for subsequent requests when actual data is being exchanged between the client and the server.

  • There is a significant difference in the transfer start time of the GET and POST requests. This is because the transfer start time of GET request involves a basic HTTP request with a negligible body size. On the other hand, POST requests require sending data from the client side, and the transfer time ends only when the client receives the first byte after the server successfully processes the request. Moreover, usually the upload speed from client to server (POST) is lower than the download speed from server to client (GET).

The transfer start comparison of GET and POST
The transfer start comparison of GET and POST
  • We can deduce from the previous point that the size of the data affects the transfer time in the POST request, whereas it affects the download time in the GET request.

Apart from the observations above, it should be noted that the distance from the server is also a key factor. While performing our experiment, the client and server were located on two different continents, roughly 7,500 miles apart.

Keeping the distance in mind, the latency times for GET and POST requests are acceptable for elastic applications or services. However, for real-time services, the above latencies are unacceptable. Luckily, techniques are available (discussed in the subsequent lessons) to enable us to reduce the latency and hence the service’s response time.

Estimation of Processing Time of an API

The Estimation of Response Time of an API